Convergence of Nearest Neighbor Pattern Classification with Selective Sampling

نویسندگان

  • Shaun N. Joseph
  • Seif Omar Abu Bakr
  • Gabriel Lugo
چکیده

In the panoply of pattern classification techniques, few enjoy the intuitive appeal and simplicity of the nearest neighbor rule: given a set of samples in some domain space whose value under some function is known, estimate the function anywhere in the domain by giving the value of the nearest sample (relative to some metric). More generally, one may use the modal value of the m nearest samples, where m ≥ 1 is some fixed integer constant, although m = 1 is known to be admissible in the sense that there is no m > 1 that is asymptotically superior in terms of prediction error [2]. The nearest neighbor rule is a nonparametric technique; that is, it does not make any assumptions about the character of the underlying function (eg, linearity) and proceed to estimate parameters modulo this assumption (eg, slope and intercept). Furthermore, it is extremely general, requiring in principle only that the domain be a metric space. The classic paper on nearest neighbor pattern classification is due to Cover and Hart [2]; a textbook treatment appears in Duda et al. [4]. Both presentations adopt a probabilistic setting, demonstrating that if the samples are independent and identically-distributed (iid), the probability of error converges to no more than twice the optimal probability of error, the so-called Bayes risk. In a fully deterministic setting, since the Bayes risk is zero, this amounts to showing that the nearest neighbor rule with iid sampling converges to the true pattern. Cover [1] extends these results to the estimation problem. Obviously iid sampling is almost certain to produce samples that are superfluous in the sense that the prediction remains equally accurate even if these samples are removed.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Evaluation Accuracy of Nearest Neighbor Sampling Method in Zagross Forests

Collection of appropriate qualitative and quantitative data is necessary for proper management and planning. Used the suitable inventory methods is necessary and accuracy of sampling methods dependent the inventory net and number of sample point. Nearest neighbor sampling method is a one of distance methods and calculated by three equations (Byth and Riple, 1980; Cotam and Curtis, 1956 and Cota...

متن کامل

Evaluation Accuracy of Nearest Neighbor Sampling Method in Zagross Forests

Collection of appropriate qualitative and quantitative data is necessary for proper management and planning. Used the suitable inventory methods is necessary and accuracy of sampling methods dependent the inventory net and number of sample point. Nearest neighbor sampling method is a one of distance methods and calculated by three equations (Byth and Riple, 1980; Cotam and Curtis, 1956 and Cota...

متن کامل

Consistency of Nearest Neighbor Classification under Selective Sampling

This paper studies nearest neighbor classification in a model where unlabeled data points arrive in a stream, and the learner decides, for each one, whether to ask for its label. Are there generic ways to augment or modify any selective sampling strategy so as to ensure the consistency of the resulting nearest neighbor classifier?

متن کامل

An Improved K-Nearest Neighbor with Crow Search Algorithm for Feature Selection in Text Documents Classification

The Internet provides easy access to a kind of library resources. However, classification of documents from a large amount of data is still an issue and demands time and energy to find certain documents. Classification of similar documents in specific classes of data can reduce the time for searching the required data, particularly text documents. This is further facilitated by using Artificial...

متن کامل

An Improved K-Nearest Neighbor with Crow Search Algorithm for Feature Selection in Text Documents Classification

The Internet provides easy access to a kind of library resources. However, classification of documents from a large amount of data is still an issue and demands time and energy to find certain documents. Classification of similar documents in specific classes of data can reduce the time for searching the required data, particularly text documents. This is further facilitated by using Artificial...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1309.1761  شماره 

صفحات  -

تاریخ انتشار 2013